\section{Binary Code Features}
We use four binary code features [2] in two levels, instruction level and structural level. 
The instruction level features include idioms and n-grams, and the graphlets and libcalls are structural level features. 

An n-gram is a short string of 3-4 bytes. It is the lowest level feature type.
Idioms are assemble instructions sequences. Idioms can have wildcards. 
For example, an idiom “u=(push ebp | * | mov esp, ebp)” is the process of stack frame set-up operation during a procedure call. 
A graphlet is defined on control flow graph, with a coloring function $\sigma: V \rightarrow C$, where C is a color set.
Different colors mean different combination of types of instructions in the basic block.
Libcall features describle whether an external library function is called from the code of interest. 

